Applications of YK Algorithm to the Internet Transmission of Web-Data: Implementation Issues and Modifications
نویسندگان
چکیده
Recently, Yang and Kieffer’ proposed a novel lossless grammar-based data compression algorithm, called the YK algorithm, in which a greedy sequential grammar transform is applied to the original data to construct an irreducible context free grammar, which is encoded indirectly by using an arithmetic coder. The YK algorithm has been shown to be universal for the class of stationary, ergodic sources. Experimental results show that the YK algorithm significantly outperforms other commonly used sequential lossless data compression algorithms such as Lempel-Ziv types of codes. Moreover, the YK algorithm is effective on a virtually unlimited range of data sizes. The basic implementation of the YK encoding algorithm consists of a sequentially iterative application of three fundamental steps-Parsing, Arithmetic Encoding, and Updating. The parsing operation searches for the longest prefix of the remaining part of the original data sequence that is representable by the current grammar. The arithmetic encoding operation encodes the parsed phrase using frequency counts over a dynamic alphabet. The updating operation uses the parsed substring to update the grammar and the frequency counts. This paper proposes five modifications of the basic YK algorithm, motivated by applications of the algorithm to the internet transmission of web-data: 1. Fast YK encoder: The parsing operation is a major step of the YK algorithm, which can contribute to a significant portion of the overall time taken by the encoder. A variant of the trie data structure, which captures the structure of the context-free grammar in a non-redundant manner, is proposed for fast parsing. This is applicable for real-time compression of IP datagrams, particularly at high data rates. 2. Pre-defined source statistics: Known source statistics can be exploited to improve compression efficiency, which is particularly effective for small IP datagrams with a known structure, such as NNTP datagrams. 3. Pre-defined grammar: Starting with a “typical” pre-defined grammar can significantly improve the compression efficiency for applications such as HTML web-page compression, because HTML responses originating at a given server tend to have identical layout and similar content, in addition to having standard HTML tags and keywords. 4. Memory constrained implementation: During YK compression, as the length of the data sequence increases, the grammar also continues to grow in size, which can potentially exhaust the available memory in the system. This paper proposes a way to check memory requirement by reusing variables in the grammar, once a user-chosen limit on grammar size is reached. 5. Error handling capability: The paper identifies all possible contingencies that can arise when an erroneous bit-stream is fed to the YK decoder, and provides explicit ways to handle these. This is important in applications where compressed IP datagrams are transmitted over unreliable links.
منابع مشابه
یک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری - مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
The rapid development of communication by the Internet and the misuse of the anonymity embedded in the nature of online written documents have led to serious security issues. Anonymous identity of the Internet tools such as emails, blogs, and Web sites have made them target methods of interest for criminal activities. On the other hand, world social and political relations have made a great int...
متن کاملA density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کاملImplementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey
Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...
متن کاملFerrite material Characterization Using S-Parameters Data
ince many applications rely on the knowledge of the electromagnetic material properties of ferrites, such as ferrite phase shifters, this paper presents an algorithm for characterizing ferrite materials in a single frequency using a rectangular waveguide system. In this method, the extraction of ferrite parameters is implemented through minimizing the difference between the measured data and th...
متن کاملMining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)
As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000